Updated #33
Conversation
The disconnected-graph check ran a pure-Python union-find over all n*k edges on every fit, even when the graph was already connected. Replace it with scipy.sparse.csgraph.connected_components (already a dependency), seed the union-find from the resulting labels, and vectorize _labels. Component detection on a connected 200k x 20 graph: ~2.9s -> ~0.16s (~19x). Outputs unchanged (same components, bridges, connectivity); tests/test_connect.py unit tests pass.
_make_layout_config stamped deterministic=True/seed onto the layout config only when none was passed; a user-supplied layout_config was returned verbatim with its default deterministic=False, so the layout ran multi-threaded and nondeterministically even under reproducible=True. Stamp deterministic/seed in both paths so the estimator owns layout reproducibility regardless of whether a config is passed (the layout's own multi-threading gives no measurable speedup, so this is free). Document that these fields are set in place on the passed object, and add a regression test asserting a reproducible=True fit with a passed LayoutConfig() is bit-identical across runs.
afloresep
left a comment
There was a problem hiding this comment.
Everything looks good. I'll change some of the pure python code in connect.py with spicy code which uses C which is much faster while achieving the same result
per my testing:
(scipy) : 1 components in 11.9 ms
(python union) : 1 components in 467.8 ms
speedup : 39x counts agree
|
Im also adding some changes on the |
…t OGDF test_estimator.py imported LayoutConfig at module top, but tmap.layout only exports it when the OGDF extension is built. pytest-core runs without OGDF, so collection failed with ImportError, failing the whole job. Import it inside the OGDF-gated regression test instead (matching test_end_to_end.py / test_layout_ogdf.py), so collection never references it when OGDF is absent.
Pre-existing one-line-docstring and formatting misses in the PR's new files, flagged by the repo's `ruff format --check src/ tests/`.
The lint job runs `ruff format --check` then `ruff check`; the prior format failure masked this pre-existing UP035 in connect.py. typing.Callable is deprecated in favor of collections.abc.Callable.
Oh crap, I forgot to port that one. I was experimenting in Python to avoid building it again and again. |
Two main changes to the layout pipeline:
I also included some new examples/tests (untangle_demo.py, adaptive_coef_exp_3panel.py, molecules_tmap_legacy.py, test_connect.py) and a 200k-row ChEMBL example dataset.